Solution To Land Classification Challenge

In [1]:
import numpy as np
import pandas as pd
In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [73]:
train_path = '/home/gaurav/Desktop/Social Cops/socialcops_challenge/socialcops_challenge/land_train.csv'
test_path  = '/home/gaurav/Desktop/Social Cops/socialcops_challenge/socialcops_challenge/land_test.csv'
data = pd.read_csv(train_path)
test = pd.read_csv(test_path)
In [4]:
print("The first 5 rows of the training dataset\n")
data.head(5)
The first 5 rows of the training dataset

Out[4]:
X1 X2 X3 X4 X5 X6 I1 I2 I3 I4 I5 I6 target
0 323 229 120 517 209 115 0.623234 -1.047476 1.473405 0.380537 -0.021277 0.424242 1
1 335 220 109 387 149 89 0.560484 -1.004514 1.200777 0.324813 -0.101010 0.444030 1
2 255 150 52 184 72 45 0.559322 -0.996822 0.825000 0.300728 -0.072165 0.437500 1
3 254 182 73 413 156 84 0.699588 -1.151258 1.425354 0.436268 0.070064 0.451670 1
4 257 219 100 722 254 130 0.756691 -1.236199 1.990973 0.506155 0.130435 0.479508 1
In [5]:
print("Basic summary of the training dataset")
data.describe()
Basic summary of the training dataset
Out[5]:
X1 X2 X3 X4 X5 X6 I1 I2 I3 I4 I5 I6 target
count 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000 110000.000000
mean 505.820582 778.290627 819.375336 1752.844418 1078.027418 739.470318 0.384085 -0.712814 1.615682 0.304885 -0.046779 0.328729 1.909091
std 428.347173 529.490028 674.422188 693.583645 868.842300 797.061740 0.387576 0.472998 1.652829 0.262865 0.421622 0.268098 0.995864
min 0.000000 79.000000 0.000000 77.000000 52.000000 25.000000 -0.517672 -1.566332 -2.842839 -0.390293 -0.904241 -0.822989 1.000000
25% 172.000000 333.000000 211.000000 1209.000000 256.000000 190.000000 -0.006770 -1.212487 -0.042051 0.041264 -0.474310 0.035974 1.000000
50% 540.000000 809.500000 817.000000 1785.000000 863.000000 346.000000 0.222476 -0.681110 1.048291 0.210903 0.170455 0.394099 2.000000
75% 699.000000 1022.000000 1237.000000 2148.000000 1601.000000 1245.000000 0.810632 -0.186692 3.352103 0.587857 0.257062 0.463504 3.000000
max 9346.000000 9615.000000 9877.000000 9316.000000 8249.000000 7180.000000 1.000000 1.144574 5.833693 0.930116 1.000000 0.881131 4.000000
In [6]:
data.info() #to check null values 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110000 entries, 0 to 109999
Data columns (total 13 columns):
X1        110000 non-null int64
X2        110000 non-null int64
X3        110000 non-null int64
X4        110000 non-null int64
X5        110000 non-null int64
X6        110000 non-null int64
I1        110000 non-null float64
I2        110000 non-null float64
I3        110000 non-null float64
I4        110000 non-null float64
I5        110000 non-null float64
I6        110000 non-null float64
target    110000 non-null int64
dtypes: float64(6), int64(7)
memory usage: 10.9 MB

Clearly there are no missing values in the data!

Plotting the distribution of each column(feature)

In [7]:
sns.set(style='whitegrid')
fig, axis = plt.subplots(nrows=1, ncols=6, figsize = (25,4))
for i in range(6):
    axis[i].set_title("Distribution Plot {}".format('X'+str(i+1)))
    sns.distplot(data['X'+str(i+1)], ax = axis[i])
    
plt.show()
    

From the distribution plots we can easily infer that X1,X2 and X3 features are very much correlated as their plots are very similar to each other

In [9]:
sns.lmplot('X1','X2', data=data, hue='target',fit_reg=True, size=5)
sns.lmplot('X1','X3', data=data, hue='target',fit_reg=True, size=5)
sns.lmplot('X2','X3', data=data, hue='target',fit_reg=True, size=5) 
Out[9]:
<seaborn.axisgrid.FacetGrid at 0x7f36f7382dd8>

From the plots above, it can be observed how X1, X2, X3 are correlated and the plot between X2 and X3 show very high positive correlation irrespective of class

In [10]:
sns.set(style='whitegrid')
fig, axis = plt.subplots(nrows=1, ncols=6, figsize = (25,4))
for i in range(6):
    axis[i].set_title("Distribution Plot {}".format('I'+str(i+1)))
    sns.distplot(data['I'+str(i+1)], ax = axis[i])
    
plt.show()

Similarly I2, I3 and I4 also seem correlated

In [11]:
sns.lmplot('I1','I3', data=data, hue='target',fit_reg=True, size=5)
sns.lmplot('I1','I4', data=data, hue='target',fit_reg=True, size=5)
sns.lmplot('I3','I4', data=data, hue='target',fit_reg=True, size=5)
Out[11]:
<seaborn.axisgrid.FacetGrid at 0x7f36f7480908>

Scatter between I1 and I4 show high correlation irrespective of class

In [12]:
sns.pairplot(data, hue="target", size=3)
Out[12]:
<seaborn.axisgrid.PairGrid at 0x7f36f9927e10>

Plotting the Correlation HeatMap

In [13]:
# Heatmap to see correlation of different feature
f, ax = plt.subplots(figsize=(13,13))
sns.heatmap(data.corr(), annot=True)
plt.show()

Feature Selection

Feature importance

In [4]:
from sklearn.ensemble import ExtraTreesClassifier
/home/gaurav/anaconda3/lib/python3.5/site-packages/sklearn/utils/fixes.py:313: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  _nan_object_mask = _nan_object_array != _nan_object_array
In [5]:
model = ExtraTreesClassifier()
model.fit(data.values[:,:-1],data.values[:,-1])
print(model.feature_importances_)
/home/gaurav/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
[ 0.014944    0.02355257  0.07072214  0.08200373  0.03503455  0.01738636
  0.15874432  0.05372846  0.21649554  0.1034142   0.16211497  0.06185915]

Feature selection by comparing explained feature variance

In [4]:
from sklearn.preprocessing import StandardScaler

# Separating out the features
x = data.iloc[:,:].values

# Separating out the target
y = data.loc[:,['target']].values

# Standardizing the features
x = StandardScaler().fit_transform(x)
/home/gaurav/anaconda3/lib/python3.5/site-packages/sklearn/utils/fixes.py:313: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  _nan_object_mask = _nan_object_array != _nan_object_array
In [5]:
from sklearn.decomposition import PCA
pca1 = PCA()
x = pca1.fit_transform(x)
pca1.explained_variance_ratio_
Out[5]:
array([  5.53690090e-01,   3.21493793e-01,   9.34671164e-02,
         1.28973544e-02,   1.08640457e-02,   5.35851892e-03,
         1.13210115e-03,   4.34720750e-04,   3.37599001e-04,
         2.45468002e-04,   4.64809338e-05,   3.27124288e-05,
         1.36582054e-16])
In [6]:
from sklearn.preprocessing import StandardScaler

# Separating out the features
x = data.iloc[:,:6].values

# Separating out the target
y = data.loc[:,['target']].values

# Standardizing the features
x = StandardScaler().fit_transform(x)
/home/gaurav/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py:590: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
/home/gaurav/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py:590: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
In [7]:
from sklearn.decomposition import PCA
pca2 = PCA()
x = pca2.fit_transform(x)
pca2.explained_variance_ratio_
Out[7]:
array([  6.92804284e-01,   2.36896133e-01,   5.86677800e-02,
         9.87730812e-03,   1.06578369e-03,   6.88711498e-04])
In [8]:
from sklearn.preprocessing import StandardScaler
# Separating out the features
x = data.iloc[:,6:-1].values

# Separating out the target
y = data.loc[:,['target']].values

# Standardizing the features
x = StandardScaler().fit_transform(x)
In [9]:
from sklearn.decomposition import PCA
pca3 = PCA()
x = pca3.fit_transform(x)
pca3.explained_variance_ratio_
Out[9]:
array([  7.14342761e-01,   2.81387161e-01,   3.24158824e-03,
         6.32776628e-04,   3.95713179e-04,   2.95940822e-16])

Explained Variance

The explained variance tells us how much information (variance) can be attributed to each of the principal components.

Clearly the last componnent or feature in this case contains the least variance, therefore we drop it

In [10]:
data = data.drop(labels='I6',axis=1)

Dealing with Outliers

In [11]:
from scipy import stats
import numpy as np

z = np.abs(stats.zscore(data.iloc[:,:-1]))
print(z)
[[ 0.42680663  1.03740032  1.03700401 ...,  0.08608109  0.28780092
   0.06048688]
 [ 0.39879184  1.05439788  1.05331435 ...,  0.2510282   0.07581156
   0.12862552]
 [ 0.58555708  1.18660116  1.13783152 ...,  0.47838305  0.01581214
   0.06021041]
 ..., 
 [ 0.22920629  0.39417234  0.56289131 ...,  0.50064517  0.5028006
   0.3226388 ]
 [ 0.12415084  0.26574629  0.43389141 ...,  0.45663443  0.44081277
   0.43663716]
 [ 0.06111757  0.20153327  0.35382251 ...,  0.41899204  0.38525141
   0.52993914]]
In [12]:
filtered = data[(z < 3).all(axis=1)]
In [13]:
filtered.shape
Out[13]:
(107515, 12)
In [14]:
data.shape
Out[14]:
(110000, 12)

Generating pie chart to visualize the number of examples in each target class

In [15]:
plt.hist(filtered.iloc[:,-1])
Out[15]:
(array([ 49919.,      0.,      0.,  29954.,      0.,      0.,  18322.,
             0.,      0.,   9320.]),
 array([ 1. ,  1.3,  1.6,  1.9,  2.2,  2.5,  2.8,  3.1,  3.4,  3.7,  4. ]),
 <a list of 10 Patch objects>)
In [16]:
filtered['target'].value_counts()
Out[16]:
1    49919
2    29954
3    18322
4     9320
Name: target, dtype: int64

Training and Validation Splitting

In [17]:
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split( filtered.iloc[:,:-1],filtered.iloc[:,-1],test_size=0.25, random_state=1)
print(X_train.shape)
print(X_val.shape)
print(y_train.shape)
print(y_val.shape)
(80636, 11)
(26879, 11)
(80636,)
(26879,)

Balancing the dataset

Using SMOTE - Synthetic Minority OverSampling Method

In [18]:
from imblearn.over_sampling import SMOTE
In [19]:
sampler = SMOTE()
X_train, y_train = sampler.fit_resample(X_train,y_train)
In [20]:
plt.hist(y_train)
Out[20]:
(array([ 37368.,      0.,      0.,  37368.,      0.,      0.,  37368.,
             0.,      0.,  37368.]),
 array([ 1. ,  1.3,  1.6,  1.9,  2.2,  2.5,  2.8,  3.1,  3.4,  3.7,  4. ]),
 <a list of 10 Patch objects>)

One Hot encoding the categorical data

In [21]:
from keras.utils import np_utils
Using TensorFlow backend.
In [22]:
y_train  = np_utils.to_categorical(y_train)
y_val  = np_utils.to_categorical(y_val)

print(y_train)
print(y_train.shape)
print(y_val.shape)
[[ 0.  1.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 ..., 
 [ 0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  1.]]
(149472, 5)
(26879, 5)
In [23]:
y_train = y_train[:,1:]
y_val = y_val[:,1:]
In [24]:
print(y_train.shape)
print(y_val.shape)
(149472, 4)
(26879, 4)

Normalizing Data

In [25]:
from sklearn.preprocessing import StandardScaler
normalizer=StandardScaler()
X_train=normalizer.fit_transform(X_train)
X_val = normalizer.transform(X_val)
print(X_train.shape)
print(X_val.shape)
(149472, 11)
(26879, 11)
/home/gaurav/anaconda3/lib/python3.5/site-packages/ipykernel_launcher.py:4: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  after removing the cwd from sys.path.

Training the Data

In [56]:
#Importing Keras Libraries
import keras
from keras.layers import Input,Dense,BatchNormalization, Activation, Dropout, Add
from keras.models import Model
from keras.utils.vis_utils import plot_model
from keras import optimizers
In [64]:
epochs=50
number_of_classes=4
batch_size=150
In [65]:
inpt = Input(shape=(11,), name='input')

x1 = Dense(units=20)(inpt)
x1 = BatchNormalization()(x1)
x1 = Activation('relu')(x1)

x2 = Dense(units=20)(x1)
x2 = BatchNormalization()(x2)
x2 = Activation('relu')(x2)
x2 = Dropout(0.4)(x2)

x3 = Dense(units=20)(x2)
x3 = BatchNormalization()(x3)
x3 = Activation('relu')(x3)

x4 = Add()([x1,x3])
x4 = BatchNormalization()(x4)
x4 = Activation('relu')(x4)
x4 = Dropout(0.2)(x4)


out = Dense(units=number_of_classes,activation='softmax')(x4)
model = Model(inputs=inpt, outputs=out)
In [66]:
model.summary()
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              (None, 11)           0                                            
__________________________________________________________________________________________________
dense_22 (Dense)                (None, 20)           240         input[0][0]                      
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 20)           80          dense_22[0][0]                   
__________________________________________________________________________________________________
activation_20 (Activation)      (None, 20)           0           batch_normalization_20[0][0]     
__________________________________________________________________________________________________
dense_23 (Dense)                (None, 20)           420         activation_20[0][0]              
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 20)           80          dense_23[0][0]                   
__________________________________________________________________________________________________
activation_21 (Activation)      (None, 20)           0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
dropout_10 (Dropout)            (None, 20)           0           activation_21[0][0]              
__________________________________________________________________________________________________
dense_24 (Dense)                (None, 20)           420         dropout_10[0][0]                 
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 20)           80          dense_24[0][0]                   
__________________________________________________________________________________________________
activation_22 (Activation)      (None, 20)           0           batch_normalization_22[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (None, 20)           0           activation_20[0][0]              
                                                                 activation_22[0][0]              
__________________________________________________________________________________________________
batch_normalization_23 (BatchNo (None, 20)           80          add_4[0][0]                      
__________________________________________________________________________________________________
activation_23 (Activation)      (None, 20)           0           batch_normalization_23[0][0]     
__________________________________________________________________________________________________
dropout_11 (Dropout)            (None, 20)           0           activation_23[0][0]              
__________________________________________________________________________________________________
dense_25 (Dense)                (None, 4)            84          dropout_11[0][0]                 
==================================================================================================
Total params: 1,484
Trainable params: 1,324
Non-trainable params: 160
__________________________________________________________________________________________________
In [67]:
checkpoint = keras.callbacks.ModelCheckpoint('weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=False, mode='auto', period=1)
adam = optimizers.Adam(lr=0.01, decay=0.0001)
model.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])
output = model.fit(x=X_train,y=y_train,batch_size=batch_size, epochs=epochs,validation_data = (X_val,y_val), callbacks=[checkpoint])
Train on 149472 samples, validate on 26879 samples
Epoch 1/50
149472/149472 [==============================] - 7s 49us/step - loss: 0.2283 - acc: 0.9080 - val_loss: 0.0770 - val_acc: 0.9733
Epoch 2/50
149472/149472 [==============================] - 6s 38us/step - loss: 0.1580 - acc: 0.9390 - val_loss: 0.0729 - val_acc: 0.9741
Epoch 3/50
149472/149472 [==============================] - 6s 38us/step - loss: 0.1473 - acc: 0.9441 - val_loss: 0.0657 - val_acc: 0.9761
Epoch 4/50
149472/149472 [==============================] - 6s 39us/step - loss: 0.1421 - acc: 0.9457 - val_loss: 0.0674 - val_acc: 0.9744
Epoch 5/50
149472/149472 [==============================] - 6s 38us/step - loss: 0.1356 - acc: 0.9485 - val_loss: 0.0610 - val_acc: 0.9786
Epoch 6/50
149472/149472 [==============================] - 6s 38us/step - loss: 0.1326 - acc: 0.9497 - val_loss: 0.0626 - val_acc: 0.9773
Epoch 7/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1280 - acc: 0.9520 - val_loss: 0.0565 - val_acc: 0.9788
Epoch 8/50
149472/149472 [==============================] - 6s 39us/step - loss: 0.1245 - acc: 0.9535 - val_loss: 0.0559 - val_acc: 0.9794
Epoch 9/50
149472/149472 [==============================] - 6s 39us/step - loss: 0.1233 - acc: 0.9532 - val_loss: 0.0523 - val_acc: 0.9805
Epoch 10/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1217 - acc: 0.9542 - val_loss: 0.0531 - val_acc: 0.9808
Epoch 11/50
149472/149472 [==============================] - 6s 38us/step - loss: 0.1177 - acc: 0.9556 - val_loss: 0.0539 - val_acc: 0.9791
Epoch 12/50
149472/149472 [==============================] - 6s 39us/step - loss: 0.1179 - acc: 0.9560 - val_loss: 0.0524 - val_acc: 0.9805
Epoch 13/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1159 - acc: 0.9565 - val_loss: 0.0632 - val_acc: 0.9754
Epoch 14/50
149472/149472 [==============================] - 6s 39us/step - loss: 0.1149 - acc: 0.9570 - val_loss: 0.0515 - val_acc: 0.9809
Epoch 15/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1139 - acc: 0.9579 - val_loss: 0.0558 - val_acc: 0.9778
Epoch 16/50
149472/149472 [==============================] - 6s 39us/step - loss: 0.1131 - acc: 0.9579 - val_loss: 0.0497 - val_acc: 0.9817
Epoch 17/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1121 - acc: 0.9578 - val_loss: 0.0499 - val_acc: 0.9812
Epoch 18/50
149472/149472 [==============================] - 6s 39us/step - loss: 0.1113 - acc: 0.9586 - val_loss: 0.0516 - val_acc: 0.9806
Epoch 19/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1119 - acc: 0.9583 - val_loss: 0.0569 - val_acc: 0.9766
Epoch 20/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1117 - acc: 0.9585 - val_loss: 0.0500 - val_acc: 0.9826
Epoch 21/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1104 - acc: 0.9587 - val_loss: 0.0532 - val_acc: 0.9796
Epoch 22/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1086 - acc: 0.9598 - val_loss: 0.0471 - val_acc: 0.9824
Epoch 23/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1082 - acc: 0.9599 - val_loss: 0.0534 - val_acc: 0.9789
Epoch 24/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1064 - acc: 0.9595 - val_loss: 0.0621 - val_acc: 0.9758
Epoch 25/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1065 - acc: 0.9607 - val_loss: 0.0507 - val_acc: 0.9802
Epoch 26/50
149472/149472 [==============================] - 7s 46us/step - loss: 0.1072 - acc: 0.9599 - val_loss: 0.0509 - val_acc: 0.9810
Epoch 27/50
149472/149472 [==============================] - 6s 43us/step - loss: 0.1074 - acc: 0.9602 - val_loss: 0.0535 - val_acc: 0.9784
Epoch 28/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1056 - acc: 0.9606 - val_loss: 0.0464 - val_acc: 0.9825
Epoch 29/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1049 - acc: 0.9611 - val_loss: 0.0524 - val_acc: 0.9799
Epoch 30/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1057 - acc: 0.9605 - val_loss: 0.0462 - val_acc: 0.9836
Epoch 31/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1059 - acc: 0.9607 - val_loss: 0.0513 - val_acc: 0.9801
Epoch 32/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1038 - acc: 0.9616 - val_loss: 0.0461 - val_acc: 0.9829
Epoch 33/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1046 - acc: 0.9617 - val_loss: 0.0459 - val_acc: 0.9828
Epoch 34/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1033 - acc: 0.9615 - val_loss: 0.0491 - val_acc: 0.9814
Epoch 35/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1024 - acc: 0.9620 - val_loss: 0.0499 - val_acc: 0.9812
Epoch 36/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1036 - acc: 0.9621 - val_loss: 0.0458 - val_acc: 0.9832
Epoch 37/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1028 - acc: 0.9622 - val_loss: 0.0463 - val_acc: 0.9834
Epoch 38/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1027 - acc: 0.9628 - val_loss: 0.0455 - val_acc: 0.9834
Epoch 39/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1013 - acc: 0.9629 - val_loss: 0.0446 - val_acc: 0.9835
Epoch 40/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1017 - acc: 0.9626 - val_loss: 0.0466 - val_acc: 0.9835
Epoch 41/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1003 - acc: 0.9635 - val_loss: 0.0458 - val_acc: 0.9835
Epoch 42/50
149472/149472 [==============================] - 6s 42us/step - loss: 0.1015 - acc: 0.9625 - val_loss: 0.0450 - val_acc: 0.9833
Epoch 43/50
149472/149472 [==============================] - 6s 42us/step - loss: 0.1004 - acc: 0.9634 - val_loss: 0.0476 - val_acc: 0.9827
Epoch 44/50
149472/149472 [==============================] - 6s 41us/step - loss: 0.1008 - acc: 0.9631 - val_loss: 0.0475 - val_acc: 0.9827
Epoch 45/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1016 - acc: 0.9629 - val_loss: 0.0501 - val_acc: 0.9819
Epoch 46/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.0995 - acc: 0.9633 - val_loss: 0.0471 - val_acc: 0.9832
Epoch 47/50
149472/149472 [==============================] - 6s 43us/step - loss: 0.0995 - acc: 0.9634 - val_loss: 0.0442 - val_acc: 0.9839
Epoch 48/50
149472/149472 [==============================] - 6s 42us/step - loss: 0.1000 - acc: 0.9634 - val_loss: 0.0454 - val_acc: 0.9837
Epoch 49/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.0999 - acc: 0.9635 - val_loss: 0.0477 - val_acc: 0.9823
Epoch 50/50
149472/149472 [==============================] - 6s 40us/step - loss: 0.1001 - acc: 0.9636 - val_loss: 0.0455 - val_acc: 0.9841

In only 50 epochs we can see that validation loss is less than training loss, hence we can infer that the model is generalizing

In [68]:
accs=output.history['acc']
val_accs=output.history['val_acc']
x_axis=[i+1 for i in range(epochs)]
plt.plot(x_axis,accs)
plt.plot(x_axis,val_accs)
plt.show()
In [69]:
loss=output.history['loss']
val_loss=output.history['val_loss']
x_axis=[i+1 for i in range(epochs)]
plt.plot(x_axis,loss)
plt.plot(x_axis,val_loss)
plt.show()
In [70]:
# Loading the model and Weights of which gave the least validation loss
from keras.models import load_model
model = load_model('/home/gaurav/Desktop/Social Cops/socialcops_challenge/socialcops_challenge/weights.47-0.04.hdf5')

Confusion Matrix

In [71]:
from sklearn.metrics import classification_report
y_val_pred=np.argmax(model.predict(X_val),axis=1)
y_val_pred=y_val_pred+1
y_val_n = np.argmax(y_val,axis=1)
y_val_n+=1
print(classification_report(y_val_n, y_val_pred))
              precision    recall  f1-score   support

           1       1.00      1.00      1.00     12551
           2       1.00      1.00      1.00      7490
           3       0.95      0.97      0.96      4583
           4       0.92      0.90      0.91      2255

   micro avg       0.98      0.98      0.98     26879
   macro avg       0.97      0.97      0.97     26879
weighted avg       0.98      0.98      0.98     26879

Making predictions

In [74]:
test = test.drop('I6',axis=1)
X_test = np.array(test)
X_test = normalizer.transform(X_test)
In [75]:
X_test.shape
Out[75]:
(2000000, 11)
In [76]:
y_pred = np.argmax(model.predict(X_test), axis=1)
In [77]:
print(y_pred)
[3 2 0 ..., 3 2 3]
In [78]:
y_pred = y_pred+1
In [79]:
print(y_pred)
[4 3 1 ..., 4 3 4]
In [80]:
test['target'] = pd.DataFrame(y_pred)
In [81]:
test
Out[81]:
X1 X2 X3 X4 X5 X6 I1 I2 I3 I4 I5 target
0 338 554 698 1605 1752 1310 0.393834 -0.350045 1.565423 0.311659 0.304781 4
1 667 976 1187 1834 1958 1653 0.214167 -0.181467 1.050679 0.196439 0.164085 3
2 249 420 402 1635 1318 736 0.605302 -0.712650 2.268984 0.441984 0.293497 1
3 111 348 279 1842 743 328 0.736917 -1.162062 3.074176 0.551699 0.080725 1
4 349 559 642 1534 1544 989 0.409926 -0.406678 1.607795 0.323984 0.212753 4
5 363 496 501 833 800 564 0.248876 -0.269084 0.821571 0.215513 0.059155 4
6 538 821 837 1224 1159 1034 0.187773 -0.215049 0.792773 0.189002 0.105291 4
7 560 683 680 1018 965 764 0.199058 -0.225785 0.759024 0.168196 0.058172 4
8 607 758 818 1164 1374 1142 0.174571 -0.091829 0.686800 0.153604 0.165306 4
9 231 434 363 2102 1167 568 0.705477 -0.991497 3.041530 0.509306 0.220193 1
10 343 553 568 1848 1658 971 0.529801 -0.583994 2.161743 0.395141 0.261858 1
11 317 515 542 1993 1668 952 0.572387 -0.661160 2.398101 0.415042 0.274431 1
12 428 610 715 1675 2039 1402 0.401674 -0.303666 1.575253 0.307034 0.324516 1
13 245 476 364 2633 1510 689 0.757090 -1.028150 3.525145 0.544320 0.308642 1
14 435 624 774 1308 1652 1354 0.256484 -0.140268 0.981512 0.220064 0.272556 4
15 241 477 481 1481 1009 609 0.509684 -0.699242 2.004012 0.403231 0.117431 3
16 614 856 1109 1913 2511 1817 0.266049 -0.130877 1.208783 0.221229 0.241969 4
17 487 694 824 1462 1800 1509 0.279090 -0.175473 1.117066 0.234773 0.293613 3
18 494 743 1018 2184 2553 1668 0.364147 -0.286250 1.694130 0.282881 0.241996 1
19 357 613 664 1848 1600 1031 0.471338 -0.543263 2.016360 0.366645 0.216519 3
20 566 823 1118 2129 3015 2073 0.311364 -0.139125 1.409616 0.250482 0.299279 1
21 469 670 718 1633 1505 1051 0.389196 -0.429986 1.633408 0.304650 0.188242 4
22 277 496 427 2247 1396 704 0.680628 -0.914227 3.015378 0.491905 0.244916 1
23 342 531 548 1674 1644 1007 0.506751 -0.515792 1.954790 0.379637 0.295177 1
24 434 637 681 2139 1792 1098 0.517021 -0.605294 2.325445 0.377615 0.234401 1
25 224 409 337 1737 1107 557 0.675024 -0.896543 2.625205 0.494894 0.246085 1
26 557 705 768 975 876 642 0.118761 -0.172245 0.481135 0.116682 -0.089362 4
27 505 738 954 1858 2583 1802 0.321479 -0.158228 1.356526 0.259139 0.307692 1
28 462 680 841 2092 2114 1329 0.426526 -0.421295 1.928958 0.320647 0.224885 1
29 327 558 632 1625 1358 937 0.439965 -0.529472 1.818120 0.346656 0.194391 3
... ... ... ... ... ... ... ... ... ... ... ... ...
1999970 419 679 750 2004 1911 1189 0.455338 -0.479092 2.004157 0.351849 0.226405 3
1999971 664 826 899 1390 1580 1254 0.214504 -0.150531 0.900955 0.179740 0.164886 4
1999972 554 837 900 1543 1417 1175 0.263201 -0.305769 1.181858 0.235089 0.132530 4
1999973 174 374 285 2217 1130 531 0.772182 -1.096951 3.339483 0.556301 0.301471 1
1999974 411 602 646 1717 1598 1031 0.453237 -0.489135 1.860149 0.343100 0.229577 1
1999975 601 833 1157 2257 3342 2235 0.322203 -0.128418 1.470068 0.250962 0.317807 1
1999976 703 922 1047 2023 2266 1718 0.317915 -0.261259 1.490293 0.250664 0.242676 4
1999977 260 419 418 1505 1293 748 0.565263 -0.641031 2.054971 0.415263 0.283019 1
1999978 733 1093 1486 2639 2898 2125 0.279515 -0.232739 1.549502 0.234406 0.176959 4
1999979 355 590 643 1938 1815 1075 0.501744 -0.534517 2.113881 0.378977 0.251455 1
1999980 290 560 624 1833 1891 1253 0.492063 -0.476489 1.981169 0.384303 0.335109 3
1999981 229 446 399 2437 1479 730 0.718618 -0.963255 3.256738 0.509742 0.293180 1
1999982 430 589 668 1768 1986 1253 0.451560 -0.393489 1.795335 0.332552 0.304529 1
1999983 333 583 592 2101 1630 957 0.560342 -0.686581 2.470453 0.417859 0.235636 1
1999984 341 628 719 2140 1928 1175 0.497027 -0.549141 2.227941 0.380862 0.240760 1
1999985 383 551 565 1741 1650 962 0.509974 -0.536810 2.019498 0.374849 0.259987 1
1999986 483 680 699 1743 1217 810 0.427518 -0.605221 1.918910 0.327067 0.073559 4
1999987 345 584 544 2192 1619 892 0.602339 -0.752693 2.669548 0.443161 0.242340 1
1999988 373 611 686 1755 1740 1176 0.437935 -0.442227 1.808232 0.342528 0.263158 3
1999989 338 610 672 2345 2064 1196 0.554524 -0.618258 2.519567 0.410202 0.280514 1
1999990 368 609 633 901 847 708 0.174707 -0.205599 0.641009 0.189962 0.055928 4
1999991 195 360 342 1645 1031 557 0.655762 -0.885209 2.518845 0.472496 0.239155 1
1999992 615 857 1205 2085 3127 2303 0.267477 -0.067554 1.218935 0.218940 0.312999 4
1999993 210 420 356 2135 1424 677 0.714171 -0.913946 2.982030 0.515299 0.310745 1
1999994 468 668 786 1379 1654 1388 0.273903 -0.183234 1.076759 0.232095 0.276909 3
1999995 232 500 481 1715 1088 613 0.561931 -0.785620 2.330792 0.438636 0.120658 3
1999996 231 407 413 1363 1041 595 0.534910 -0.668853 1.937565 0.405925 0.180556 1
1999997 407 529 607 857 908 828 0.170765 -0.141870 0.595069 0.155119 0.154007 4
1999998 376 589 607 1543 1339 903 0.435349 -0.506133 1.743528 0.342194 0.196026 3
1999999 567 858 989 2121 2249 1689 0.363987 -0.334697 1.712402 0.292641 0.261389 4

2000000 rows × 12 columns

In [82]:
temp=pd.read_csv('/home/gaurav/Desktop/Social Cops/socialcops_challenge/socialcops_challenge/land_test.csv')
df=pd.DataFrame(data=test)
df['I6']=temp['I6']
target=df['target']
df=df.drop(['target'],axis=1)
df['target']=target
print(df)
df.to_csv('labelled_land_test.csv',index=False)
          X1    X2    X3    X4    X5    X6        I1        I2        I3  \
0        338   554   698  1605  1752  1310  0.393834 -0.350045  1.565423   
1        667   976  1187  1834  1958  1653  0.214167 -0.181467  1.050679   
2        249   420   402  1635  1318   736  0.605302 -0.712650  2.268984   
3        111   348   279  1842   743   328  0.736917 -1.162062  3.074176   
4        349   559   642  1534  1544   989  0.409926 -0.406678  1.607795   
5        363   496   501   833   800   564  0.248876 -0.269084  0.821571   
6        538   821   837  1224  1159  1034  0.187773 -0.215049  0.792773   
7        560   683   680  1018   965   764  0.199058 -0.225785  0.759024   
8        607   758   818  1164  1374  1142  0.174571 -0.091829  0.686800   
9        231   434   363  2102  1167   568  0.705477 -0.991497  3.041530   
10       343   553   568  1848  1658   971  0.529801 -0.583994  2.161743   
11       317   515   542  1993  1668   952  0.572387 -0.661160  2.398101   
12       428   610   715  1675  2039  1402  0.401674 -0.303666  1.575253   
13       245   476   364  2633  1510   689  0.757090 -1.028150  3.525145   
14       435   624   774  1308  1652  1354  0.256484 -0.140268  0.981512   
15       241   477   481  1481  1009   609  0.509684 -0.699242  2.004012   
16       614   856  1109  1913  2511  1817  0.266049 -0.130877  1.208783   
17       487   694   824  1462  1800  1509  0.279090 -0.175473  1.117066   
18       494   743  1018  2184  2553  1668  0.364147 -0.286250  1.694130   
19       357   613   664  1848  1600  1031  0.471338 -0.543263  2.016360   
20       566   823  1118  2129  3015  2073  0.311364 -0.139125  1.409616   
21       469   670   718  1633  1505  1051  0.389196 -0.429986  1.633408   
22       277   496   427  2247  1396   704  0.680628 -0.914227  3.015378   
23       342   531   548  1674  1644  1007  0.506751 -0.515792  1.954790   
24       434   637   681  2139  1792  1098  0.517021 -0.605294  2.325445   
25       224   409   337  1737  1107   557  0.675024 -0.896543  2.625205   
26       557   705   768   975   876   642  0.118761 -0.172245  0.481135   
27       505   738   954  1858  2583  1802  0.321479 -0.158228  1.356526   
28       462   680   841  2092  2114  1329  0.426526 -0.421295  1.928958   
29       327   558   632  1625  1358   937  0.439965 -0.529472  1.818120   
...      ...   ...   ...   ...   ...   ...       ...       ...       ...   
1999970  419   679   750  2004  1911  1189  0.455338 -0.479092  2.004157   
1999971  664   826   899  1390  1580  1254  0.214504 -0.150531  0.900955   
1999972  554   837   900  1543  1417  1175  0.263201 -0.305769  1.181858   
1999973  174   374   285  2217  1130   531  0.772182 -1.096951  3.339483   
1999974  411   602   646  1717  1598  1031  0.453237 -0.489135  1.860149   
1999975  601   833  1157  2257  3342  2235  0.322203 -0.128418  1.470068   
1999976  703   922  1047  2023  2266  1718  0.317915 -0.261259  1.490293   
1999977  260   419   418  1505  1293   748  0.565263 -0.641031  2.054971   
1999978  733  1093  1486  2639  2898  2125  0.279515 -0.232739  1.549502   
1999979  355   590   643  1938  1815  1075  0.501744 -0.534517  2.113881   
1999980  290   560   624  1833  1891  1253  0.492063 -0.476489  1.981169   
1999981  229   446   399  2437  1479   730  0.718618 -0.963255  3.256738   
1999982  430   589   668  1768  1986  1253  0.451560 -0.393489  1.795335   
1999983  333   583   592  2101  1630   957  0.560342 -0.686581  2.470453   
1999984  341   628   719  2140  1928  1175  0.497027 -0.549141  2.227941   
1999985  383   551   565  1741  1650   962  0.509974 -0.536810  2.019498   
1999986  483   680   699  1743  1217   810  0.427518 -0.605221  1.918910   
1999987  345   584   544  2192  1619   892  0.602339 -0.752693  2.669548   
1999988  373   611   686  1755  1740  1176  0.437935 -0.442227  1.808232   
1999989  338   610   672  2345  2064  1196  0.554524 -0.618258  2.519567   
1999990  368   609   633   901   847   708  0.174707 -0.205599  0.641009   
1999991  195   360   342  1645  1031   557  0.655762 -0.885209  2.518845   
1999992  615   857  1205  2085  3127  2303  0.267477 -0.067554  1.218935   
1999993  210   420   356  2135  1424   677  0.714171 -0.913946  2.982030   
1999994  468   668   786  1379  1654  1388  0.273903 -0.183234  1.076759   
1999995  232   500   481  1715  1088   613  0.561931 -0.785620  2.330792   
1999996  231   407   413  1363  1041   595  0.534910 -0.668853  1.937565   
1999997  407   529   607   857   908   828  0.170765 -0.141870  0.595069   
1999998  376   589   607  1543  1339   903  0.435349 -0.506133  1.743528   
1999999  567   858   989  2121  2249  1689  0.363987 -0.334697  1.712402   

               I4        I5        I6  target  
0        0.311659  0.304781 -0.043789       4  
1        0.196439  0.164085 -0.032700       3  
2        0.441984  0.293497  0.107348       1  
3        0.551699  0.080725  0.425145       1  
4        0.323984  0.212753 -0.003249       4  
5        0.215513  0.059155  0.020208       4  
6        0.189002  0.105291  0.027277       4  
7        0.168196  0.058172  0.026727       4  
8        0.153604  0.165306 -0.082742       4  
9        0.509306  0.220193  0.286020       1  
10       0.395141  0.261858  0.054193       1  
11       0.415042  0.274431  0.088774       1  
12       0.307034  0.324516 -0.098008       1  
13       0.544320  0.308642  0.271060       1  
14       0.220064  0.272556 -0.116216       4  
15       0.403231  0.117431  0.189558       3  
16       0.221229  0.241969 -0.135172       4  
17       0.234773  0.293613 -0.103617       3  
18       0.282881  0.241996 -0.077897       1  
19       0.366645  0.216519  0.071926       3  
20       0.250482  0.299279 -0.172240       1  
21       0.304650  0.188242  0.040790       4  
22       0.491905  0.244916  0.233599       1  
23       0.379637  0.295177  0.009042       1  
24       0.377615  0.234401  0.088273       1  
25       0.494894  0.246085  0.221519       1  
26       0.116682 -0.089362  0.053485       4  
27       0.259139  0.307692 -0.163252       1  
28       0.320647  0.224885 -0.005231       1  
29       0.346656  0.194391  0.089507       3  
...           ...       ...       ...     ...  
1999970  0.351849  0.226405  0.023755       3  
1999971  0.179740  0.164886 -0.063973       4  
1999972  0.235089  0.132530  0.042568       4  
1999973  0.556301  0.301471  0.324768       1  
1999974  0.343100  0.229577  0.035897       1  
1999975  0.250962  0.317807 -0.193785       1  
1999976  0.250664  0.242676 -0.056657       4  
1999977  0.415263  0.283019  0.075768       1  
1999978  0.234406  0.176959 -0.046776       4  
1999979  0.378977  0.251455  0.032774       1  
1999980  0.384303  0.335109 -0.015575       3  
1999981  0.509742  0.293180  0.244637       1  
1999982  0.332552  0.304529 -0.058071       1  
1999983  0.417859  0.235636  0.126240       1  
1999984  0.380862  0.240760  0.052114       1  
1999985  0.374849  0.259987  0.026836       1  
1999986  0.327067  0.073559  0.177703       4  
1999987  0.443161  0.242340  0.150354       1  
1999988  0.342528  0.263158  0.004292       3  
1999989  0.410202  0.280514  0.063733       1  
1999990  0.189962  0.055928  0.030892       4  
1999991  0.472496  0.239155  0.229447       1  
1999992  0.218940  0.312999 -0.199923       4  
1999993  0.515299  0.310745  0.199775       1  
1999994  0.232095  0.276909 -0.090669       3  
1999995  0.438636  0.120658  0.223689       3  
1999996  0.405925  0.180556  0.133943       1  
1999997  0.155119  0.154007 -0.028895       4  
1999998  0.342194  0.196026  0.070784       3  
1999999  0.292641  0.261389 -0.029291       4  

[2000000 rows x 13 columns]